Skip to content

Strengthen handling of unavailable cgroup stats#21094

Merged
jasontedor merged 6 commits intoelastic:masterfrom
jasontedor:unavailable-cgroup
Oct 24, 2016
Merged

Strengthen handling of unavailable cgroup stats#21094
jasontedor merged 6 commits intoelastic:masterfrom
jasontedor:unavailable-cgroup

Conversation

@jasontedor
Copy link
Copy Markdown
Member

On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.

Relates #21029

@jasontedor jasontedor added review :Core/Infra/Stats Statistics tracking and retrieval APIs v6.0.0-alpha1 v5.1.1 labels Oct 24, 2016
@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 24, 2016

should we just throw FNF and catch on a top level as we do in other cases?

On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.
@jasontedor jasontedor changed the title Strength handling of unavailable cgroup stats Strengthen handling of unavailable cgroup stats Oct 24, 2016
@jasontedor
Copy link
Copy Markdown
Member Author

should we just throw FNF and catch on a top level as we do in other cases?

@s1monw We can throw if /proc/self/cgroup is not available, which we do now, but we shouldn't throw if some /sys/fs/cgroup... is not available because it might be the case that the cpu subsystem is configured but not the cpuacct subsystem. Doing it the way that I've done it at least lets us get what is available.

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 24, 2016

@jason ok can we check this higher up to skip the entire query of this subsystem

@jasontedor
Copy link
Copy Markdown
Member Author

ok can we check this higher up to skip the entire query of this subsystem

Sorry, I wasn't completely clear in my previous comment. We do already check if the subsystem is available or not, but it might still be the case that the necessary mounts to /sys/fs/cgroup/ are not available and this can vary between subsystems (whether or not they are configured, whether or not the mounts to /sys/fs/cgroup are available).

This commit checks for the existence of cgroup stats upfront rather than
allowing leniency in the cgroup methods.
@jasontedor
Copy link
Copy Markdown
Member Author

@s1monw I pushed 29ed3ba which might be more to your tastes? 😄

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 24, 2016

looks great I would be even happier if we had a static method that says boolean isCgroupsAvailable() and add the empty check back?

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 24, 2016

@jasontedor I am ok with pushing this since some builds are failing but I think we can do better with a nice util method?

This commit refactors the logic for the upfront checking whether the
relevant cgroup stats are available or not, and adds a note to the docs.
@jasontedor
Copy link
Copy Markdown
Member Author

I am ok with pushing this since some builds are failing but I think we can do better with a nice util method?

I pushed 4c8a2cc.

This commit adds Javadocs for a convenience method
OsProbe#areCgroupStatsAvailable.
Copy link
Copy Markdown
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some questions, LGTM otherwise

final String[] controllers = matcher.group(1).split(",");
for (final String controller : controllers) {
controllerMap.put(controller, matcher.group(2));
if (lines.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this obsolete now? we don't return empty form this method?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 8c7d7be.

* @return {@code true} if the stats are available, otherwise
* {@code false}
*/
@SuppressForbidden(reason = "access /proc/self/cgroup, /sys/fs/cgroup/cpu, and /sys/fs/cgroup/cpuacct")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

getCgroupCpuAcctCpuCfsPeriodMicros(cpuControlGroup),
getCGroupCpuAcctCpuCfsQuotaMicros(cpuControlGroup),
getCgroupCpuAcctCpuStat(cpuControlGroup));
if (!areCgroupStatsAvailable()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Override
public void writeTo(final StreamOutput out) throws IOException {
out.writeString(cpuAcctControlGroup);
out.writeOptionalString(cpuAcctControlGroup);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you require it non-null above but here it's optional?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a bug, I cleaned up all the optionality in 4c8a2cc but missed this one. I pushed 1d53577.

This commit removes some leniency when reading cgroups that was leftover
after a previous refactoring.
This commit fixes a serialization bug in OsStats when serializing Cgroup
instances. Namely, the control groups were made non-optional in a
previous refactoring but the commit that cleaned this up left one case
of writing a string as optional.
@jasontedor
Copy link
Copy Markdown
Member Author

Thanks @s1monw, I've responded to your feedback.

@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 24, 2016

LGTM 👍

@jasontedor jasontedor merged commit 900ee05 into elastic:master Oct 24, 2016
jasontedor added a commit that referenced this pull request Oct 24, 2016
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.

Relates #21094
@jasontedor jasontedor deleted the unavailable-cgroup branch October 24, 2016 20:39
@jasontedor
Copy link
Copy Markdown
Member Author

Thanks @s1monw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants